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!_H ! 1 INTRODUCTION 



ABSTRACT 

We describe a method to estimate the mass distribution of a gravitational lens and the 
position of the sources from combined strong and weak lensing data. The algorithm 
combines weak and strong lensing data in a unified way producing a solution which is 
valid in both the weak and strong lensing regimes. We study how the result depends 
on the relative weighting of the weak and strong lensing data and on choice of basis 
to represent the mass distribution. We find that combining weak and strong lensing 
information has two major advantages: it eliminates the need for priors and/or regular- 
ization schemes for the intrinsic size of the background galaxies (this assumption was 
needed in previous strong lensing algorithms) and it corrects for biases in the recovered 
mass in the outer regions where the strong lensing data is less sensitive. The code is 
implemented into a software package called WSLAP (Weak & Strong Lensing Analysis 
Package) which is publicly available at http://darwin.cfa.harvard.edu/SLAP/. 

Key words: galaxies:clusters:general; methods:data analysis; dark matter 
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Lensing problems usually distinguish between two regimes, 
strong and weak. In the strong lensing regime, a background 
source galaxy appears as multiple images, while in the weak 
lensing regime, its image suffers a small distortion which 
typically elongates it in a direction orthogonal to the gradi- 
ent of the potential. The two problems are normally studied 
separately and, at best, they are combined afterward. Only 
a few attempts have been made to combine both regimes 
in the same analysis (e.g. Bradac et al. 2005, Broadhurst, 
Takada, Umetsu et al. 2005b). 

The quality and quantity of strong and weak lensing 
data is growing rapidly, motivating the use of algorithms 
capable of making full use of the amount of information 
present in the images. In the early years of strong lensing 
data analysis, it was common to have only few constraints to 
work with. The small number of constraints made it impos- 
sible to extract useful information about the mass distribu- 
tion of the lens without invoking a simple parametrization 
of the lens or the gravitational potential (Kneib et al. 1993, 
1995, 1996, Broadhurst et al 1995, 2005a, Sand et al. 2002, 
Gavazzi et al. 2004). The common use of parametric mod- 
els requires making educated guesses about the cluster mass 
distribution, for instance that the dark matter halos trace 
the luminosity of the cluster or that galaxy profiles possess 



certain symmetries. 

Nowadays, it is possible to obtain strong lensing im- 
ages around the center of galaxy clusters with hundreds of 
arcs (Broadhurst et al. 2005a), where each arc contributes 
with several effective constraints in the process of solving 
for the projected mass distribution of the lens. In addition, 
weak lensing measurements provide shear constraints over 
a larger field of view. When added together, the number 
of constraints can be sufficiently high that non-parametric 
methods can be used in the reconstruction of the mass. With 
such a large number of constraints, non-parametric methods 
have a chance to compete with the parametric ones, comple- 
menting their results and raising interesting questions if sig- 
nificant disagreements are found between the two method- 
ologies. 

Non-parametric approaches have been previously ex- 
plored in several papers (Saha et al. 1997, Abdelsalam et 
al. 1988b, 1998c, Trotter et al. 2000, Williams & Saha 2001, 
Warren & Dye 2003, Saha & Williams 2004, Bradac et al. 
2005, Treu & Koopmas 2004) and more recently in Diego 
et al. (2005a) (hereafter paper I). In paper I, the authors 
showed that it is possible to non-parametrically reconstruct 
a generic mass profile (with substructure) provided that the 
number of strongly lensed arcs with known redshifts is suf- 
ficiently large. They developed a package called SLAP upon 
which WSLAP is based. Paper I also showed how work- 
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ing with extended images rather than just their positions 
adds enough constraints to solve the regularization problem 
found in other non-parametric algorithms (see Kochanek et 
al. 2004 for a discussion of this issue). An application of the 
method using SLAP on data from A1689 can be found in 
Diego et al. 2005b (hereafter paper II). 

Most of the literature on lensing observations is based 
on either weak or strong lensing data. Only few papers have 
attempted to combine both weak and strong lensing data 
(e.g Bradac et al. 2005, Broadhurst, Takada, Umetsu et al. 
2005b). The main advantage of combining both regimes is 
that they complement each other, filling the gaps and cor- 
recting the deficits of each other. Strong lensing data is par- 
ticularly sensitive to the central mass distribution of the 
lens but is relatively insensitive to the outer regions. On the 
other hand, weak lensing cannot capture the fine details in 
the central regions but can trace the mass distribution fur- 
ther out than strong lensing data. One of the problems of 
modeling weak lensing data is the so-called mass-sheet de- 
generacy. Strong lensing can break this degeneracy if several 
arcs are observed and the sources of these arcs span a wide 
range of redshifts. Some algorithms have been proposed for 
combining the weak and strong lensing regimes (Abdelsalam 
et al. 1998a; Bridle et al. 1998; Saha, Williams and Abdel- 
salam 1999; Kneib et al. 2003; Smith et al. 2004; Bradac et 
al 2004), but usually they need to assume a prior on the 
mass (Kneib) or luminosity (Abdelsalam) or regularize the 
problem (Bradac, Abdelsalam). One of the purposes of this 
paper is to show how the above assumptions can be elimi- 
nated and that a well defined likelihood can be defined for 
the combined weak and strong lensing data set. 



2 WSLAP AND STRONG LENSING 

The fundamental problem in lens modeling is the following: 
Given the Ng positions of lensed images, 9, what are the 
corresponding positions (3 of the background galaxies and 
the mass distribution m(0) of the lens? Mathematically this 
entails inverting the lens equation 



/3 = 0-a(6,m(6)) 



(1) 



where a(6) is the deflection angle created by the lens which 
depends on the observed positions, 6. Each observed position 
9 contributes with two constraints, 9 — (9 x ,0 y ), so we have 
2Ng strong lensing constraints. 

The deflection angle a at the position 9, is found by 
integrating the contributions from the whole mass distribu- 
tion: 



a(6) 



= 4G_D^ f 
c 2 D.Di J 



m(9') 



\0-9'\ 



-de' 



(2) 



where Di 3 , Di, and D s are the angular distances from the 
lens to the source galaxy, the distance from the observer to 
the lens and the distance from the observer to the source 
galaxy respectively. In equation (2) we have made the usual 
thin lens approximation so the mass m(0') is the projected 
mass along the line of sight 6' . From the deflection angle one 
can easily derive the magnification produced by the lens at 
a given position: 



»-\e) = i 



da x 
dx 



da v 



da x da v da x da v 



dy dx dy dy dy 



(3) 



We find it convenient to expand the projected mass dis- 
tribution in an set of basis functions: 



m(x,y) = ~y]cifi(x,y), 



(4) 



where fi(x,y) are the basis functions and C; the coefficients 
of the decomposition. Here fi(x,y), can be any sort of 2D 
function. For instance, one can choose orthogonal polyno- 
mials like the Legendre or Hermite polynomials. Or one can 
use Fourier or wavelet functions as the basis. We find that 
the best results are obtained using compact basis functions 
defined on a gridded version of the mass distribution like 
the ones used in papers I and II, since using extended ones 
tends to over-produce arcs in the final result — see however 
Sandvik et. al. (in preparation) for a novel approach to this 
problem. In papers I and II we used for fi Gaussians with 
varying widths defined in a multi-resolution grid. In this pa- 
per we will focus on compact bases and will compare the 
results using three different compact bases. 

After decomposing the mass as in equation (4), equation 
(2) can be rewritten as 



0,(0,-) 



fi(0') 



\0-0'\ 



A^q/H^), (5) 



where all the constants and distance factors are absorbed 
into the variable Aj. Note that there is a different Aj for 
each source since Aj includes the distance factors Di, D s 
and Dis which vary for each source. The factor fi(6j) is the 
convolution of the basis function fi with the kernel (e — 
e')/\e - e'\ 2 evaluated at the point 9: 



fi(9j) = J fi(0') 



T d9'. 



\9-9'\ 2 

If now we define the matrix T by 



(6) 



(7) 



then all the constraints given by equation (1) can be ex- 
pressed in the simple form 



= Yc -(3. 



(8) 



where is the array (vector) containing all the e positions 
(e x and 0y). The matrix T has a straightforward physical 
interpretation: the element Tji is just the deflection angle 
created by the basis function /; at sky position 9j . Note that 
since 0j has two components (the x and y components), the 
are two corresponding elements in T. 

If we group all the unknowns in our problem (both (3 
and c or the mass in each cell) into a new vector x, then 
equation (8) can be rewritten in the more compact form 



© = Ax, 



(9) 



where A is a 2Ng x (N c + 2A s )-dimensional matrix and 
x is the (N c + 2A s )-dimensional vector containing all the 
unknowns in our problem (see paper I), i.e., the N c cell 
masses itij (or coefficients q), and the 2N S central positions 
f3 (x and y) of the N a sources. 



3 ADDING WEAK LENSING 

So far we have focused on solving a system of linear equa- 
tions corresponding to strong lensing data. If weak lensing 
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information is available, it can be easily incorporated into 
equation (9), allowing us to find the combined solution of 
the weak plus strong lensing as we will see below. 

Given the gravitational potential ip, the shear is defined 
in terms of the second partial derivatives of the potential ip 
(the Hessian of t/>): 



ipij = 

7i (0) = 
72(e) = 



-(ipll ~ V>22) = y(6) cos[2p], 
1P12 = ip2i = 7(0)sin[2c£>], 



(10) 

(11) 
(12) 



where 7(0) is the amplitude of the shear and cp its orienta- 
tion. The shear can be computed in a way very similar to 
the magnification /1 yielding 



71 = 2 



f f da x da y 
dx dy 



72 = 



da x 



da v 



(13) 



(14) 



dy dx 

The amplitude and orientation of the shear are given by 

(15) 

(16) 



7 = VtF+tF. 

1 . /72 s 

(fi = -atan( — , 

2 71 



Given a number of shear measurements, an equation 
similar to (8) can be written for the shear: 



71 

72 



Ai 
A 2 



(17) 



where each element in the matrices Ai and A2 represents 
the contribution to the shear (71 and 72 respectively) of 
each one of the basis functions. The expression for can 
be easily derived from equations (7), (13) and (14). The ex- 
plicit form of Aij is given in Diego et al. (2005d). 

After combining the strong and weak lensing regimes by 
regrouping the observed ^-positions of the strongly lensed 
galaxies and the measured shear, the new measurement vec- 
tor ^ will have the structure 



(#*, 6^,71, 72), 



(18) 
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and the corresponding system of linear equations represent- 
ing the lens equation reads 



(19) 



where we have explicitly expanded the matrix A and the 
vector of unknowns x into their components. In the above 
equation, the matrix contains all zeros while the ij ele- 
ments in matrix I x are ones if the 6i pixel (^-coordinate) 
is coming from the (3j source (j/-coordinate) and zero other- 
wise. The matrix I y is defined in an analogous way for the 
^/-coordinates. The above equation written in compact form 
is simply 

= Tx. (20) 
In summary, we have formulated the full weak and strong 



lensing problem in a manner were the observables $ depend 
linearly on the unknowns x, so all the complicated physics 
and geometry is conveniently encoded into the known matrix 

r. 

In principle, an exact solution for x exists if the inverse 
of r exists (i.e x = T _1 $). However, in most cases, T is 
singular and therefore does not have an inverse (some of the 
eigenvalues are basically zero within rounding errors), so a 
direct inversion of the problem is not possible. Furthermore, 
even when the inverse of T exists, we may not be interested 
in finding the exact solution, but rather in an approximate 
solution of equation (20). The reason is twofold. The defi- 
nition of x assumed that the source galaxies responsible for 
the strong lensing arcs are point-like (that is, each source is 
defined only by its coordinates, (3 X and /3 y ). This assumption 
is inaccurate as the galaxies will have some spatial extent, 
so we want the solution to allow for some residual in equa- 
tion (20). Second, for the mass we have assumed that it is 
a superposition of certain basis functions, say cells. This as- 
sumption, although a good approximation, is also partially 
inaccurate, so we want to incorporate this in our analysis by 
allowing some residual (|r| > 0) in the lens equation. This 
residual is defined as 

r = # Tx. (21) 



4 SOLVING THE LENS PROBLEM 

The fundamental task we are faced with is to obtain the 
coefficients c, describing the lens surface mass density, and 
the positions j3 of the background galaxies in order for their 
combination to explain the observed arcs 9 and the shear 7. 
In the previous section, we have shown how the unknowns of 
the problem can be combined into a vector x, the observed 
data into another vector, $, and the connection between the 
two is given by the matrix T. These three elements relate to 
each other through the system of linear equations (20). 

We adopt a Bayesian approach to solving the problem, 
finding the solution x that maximizes the likelihood func- 
tion. 



£(x) 



(22) 



where we have assumed that the residual r is Gaussian dis- 
tributed. The x 2 is defined as 



2 t/~i — 1 
X = r C r, 



(23) 



where C is the covariance matrix of the residual r. We model 
the residuals as uncorrelated (C is diagonal) and that the 
elements on the diagonal are equal to either a\ or , where 
the former is associated with the expected residual variance 
in the strong lensing data and the latter is for the expected 
residual variance in the shear measurements. 

We will extensively discus how to best choose C below 
in Section 6.2. For the main calculations in this paper, we 
assume that the rms error ae is of the order of a few pixels 
in the source plane, and model 07 as uniform over the field 
of view, equal to 0.005 for both the 71 and 72 components. 
Errors in shear measurements can be in the range of a few 
percent for well calibrated experiments (e.g Hirata et al. 
2005, Heymans et al. 2005). A value of ct 7 = 0.005 is at the 
percent level for a typical cluster. The covariance matrix C 
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can also incorporate a measure of the noise in the data both 
in the strong and weak lensing. 

We will now explore two alternative approaches for find- 
ing the solution that maximizes the likelihood (minimizes 
the x 2 )- 



4.1 Bi-conjugate Gradient Method 

Substituting equation (21) into equation (23), 



x = (#-rx)'c _1 (#-rx) 



(24) 



2$ f C Tx + x'r'C Tx 



= b — a*x + -x'Ax, 

where we have defined the constant b = #'C~ the vector 
a = 2r*Cr 1 # and the matrix A = 2r*CT 1 r. 

Minimizing this by setting the derivative with respect 
to x equal to zero gives a formal solution x = A _1 a. This is 
not useful in practice, however, since T (and therefore also 
A) is normally rather singular. In paper I, we found a sim- 
ple regularization technique that gives physically reasonable 
results: minimizing \ 2 using the iterative bi-conjugate gra- 
dient method (Press et al. 1997), but stopping once an ap- 
proximate solution of equation equation (20) had been found 
rather than continuing to iterate toward the formal solution. 
Specifically, the bi-conjugate gradient method performs suc- 
cessive minimizations which are carried out in a series of 
orthogonal conjugate directions with respect to the metric 
A. The algorithm starts with an initial guess for the solu- 
tion (for instance xo = 0). Then the algorithm chooses as a 
first minimization direction the gradient Vx 2 at xo. Then it 
minimizes in directions which are conjugate to the previous 
ones until it reaches the minimum or the x is smaller than 
certain target value e. We will discuss later how to choose 
e — we will find that combining weak and strong lensing 
makes the choice of e much less relevant than when only 
strong lensing data is used in the analysis. 



4.2 Nonnegative quadratic programming 

Although the bi-conjugate gradient method is a fast and ef- 
fective way to find an approximate solution, it is not ideal. 
The regularization procedure was required because certain 
modes in the mass distribution corresponded to eigenval- 
ues near zero in the matrix A. Plotting these unconstrained 
modes shows that they all oscillate, trading off positive mass 
in some places against unphysical negative mass elsewhere. 
Without regularization, the solution can include such modes 
of significant amplitude, involving negative mass in certain 
cells. Both the regularization problem and negative mass 
problem can therefore be eliminated in one fell swoop by 
using a constrained minimization algorithms that only min- 
imizes x 2 m the physically meaningful region of the param- 
eter space c where all masses are non-negative. Our case 
is particularly simple: we are minimizing a quadratic func- 
tion of x subject to constraints on x that are linear. This 
is a well studied problem in optimization theory, and sev- 
eral methods have been proposed in the context of quadratic 
programming (QADP). In this paper we will explore the ap- 
proach of Sha, Saul & Lee (2002) known as multiplicative 
updates for nonnegative quadratic programming. 



Following Sha et al. (2002), we can minimize the 
quadratic objective function 



/(x) = ix'Ax + a'x 



(25) 



subject to a non-negative mass constraint, i.e., the con- 
straint that rrii > for all i, where rrii is the mass at po- 
sition i. Note that when compared with equation (24), we 
have changed the sign of vector a to keep the same notation 
of Sha et al. (2002). In our vector x, the mass distribution 
is represented by expansion coefficients c rather than cell 
masses m, and equation (4) shows that these two vectors 
are related by 



Fc, 



(26) 



where the element F« is the value of the basis function /; 
at position i. We can therefore make the substitution x = 
F _1 m in equation (25) and rewrite it as a function of a 
transformed vector denoted x' which equals x except that 
the elements defining the mass distribution are m = Fc 
rather than c: 



j (x ) = — x Ax +a x , 



(27) 



where a' is the same as a but with the elements related to 
masses multiplied by F _1 , and A' is the same as A but 
with the sub-matrix related to masses multiplied by F _1 
both from the left and from the right. In general, the di- 
mensionality of x' can be different than the dimensionality 
of x. However, to keep the problem simple, we assume that 
the masses in equation (26) are evaluated only at the central 
position of each cell. That makes the dimensions of x and 
x 1 equal and rrii can be interpreted as simply the total pro- 
jected mass in the i th -cell. Since the positions (3 can be also 
made positive (by defining the origin of the coordinates in 
the left bottom corner of the field of view), all components 
in the vector x' (rrii and (3j) have to be positive. 

In conclusion, we wish to minimize equation (27) sub- 
ject to the constraints that all elements x\ > 0. We solve this 
problem iteratively using the multiplicative update tech- 
nique of Sha, Saul & Lee (2002). For simplicity of notation, 
we suppress all primes from equation (27) below. Let us split 
the matrix A into its positive and negative parts A + and 
A such that A = A 
and otherwise and A 



A , where At = A i3 if A i3 > 
—An if An < and otherwise. 



The solution is iteratively updated by the rule 

x i+ i = XiSi, (28) 

where the multiplicative updates 5i are defined by 



5i = 



-Oi + + 4(A + x),(A x)~ 
2(A+x)i 



(29) 



It is easy to see that generic quadratic programming 
problems have a single unique minimum. Let x* denote this 
global minimum of /(x) (within the non-negative mass part 
of parameter space). Let us prove that convergence of the 
iteration equation (29) corresponds to this minimum x* . At 
this point, one of two conditions must apply for each com- 
ponent x*: Either (i) x* > and J^-(x*) = or (ii) x* = 

and J^-(x*) > 0. Now since 
^(x*) = (A+x) i -(A-x) i + o i , 



(30) 
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Figure 1. Original mass used to test the algorithm (at z=0.4). The mass is built out of a superposition of 30 NFW profiles with added 
cllipticities. The total mass in the field of view (6 arcmin) is 1.17397 X 10 15 /i — 1 Mq. The left panel shows the central region (3 arcmin, 
0.663 X 10 15 /i~ 1 Mq) and the strongly lensed galaxies. These arcs are lensed images of 9 galaxies between redshifts 1 and 6.5. The right 
panel (6 arcmin) shows the shear field and the outer regions of the cluster. 



the multiplicative updates in both cases (i) and (ii) take the 
value Si — 1, the minimum is a fixed point. Conversely, a 
fixed point of the iteration must be the minimum x*. 



5 SIMULATIONS 

Testing the algorithm with simulations is essential, not only 
to prove its feasibility but also to identify its failures and 
weaknesses. In this paper, we will show some results using a 
simulated cluster with a particularly rich structure. The mo- 
tivation for this is twofold. First, using a highly asymmetric 
distribution motivates the use of non-parametric methods 
where no assumptions about the distribution of the mass 
are needed. Second, asymmetries may play a role introduc- 
ing biases in the result which we may want to study. 
Also, with simulations we can test how different choices for 
fi and C affect the result. This last step is important since 
fi and C arc basically the only assumptions made in the 
process of fitting the data. 

The simulated data are made of a combination of 3 ba- 
sic ingredients: 

1) the lens mass distribution that we will try to recover, 

2) the arcs observed in the central region of our field of view 
that will constitute the strong lensing part of the data, 

3) the shear measured over the entire field of view which will 
constitute the weak lensing part of the data. 



5.1 Mass distribution 

In order to test the algorithm, we will use a simulated clus- 
ter with abundant internal structure. The cluster is placed 
at redshift z=0.4. It has a highly elliptical extended large 
scale component at large scale and the central region has 
several clumps surrounding the central peak. These clumps 
are generated from NFW profiles with added ellipticities. 
There is also a filamentary component crossing the field of 
view. The simulated cluster has a total projected mass of 
1.174 x 10 15 /i _1 Mq over the field of view (6 arcmin) and is 
shown in figure 1. This field of view corresponds to a scale of 
1.35 hT 1 Mpc and normally covers more than half the virial 
radius expected for clusters with this mass. 



5.2 Strong lensing data 

To generate the arcs, we place several sources behind the 
cluster. The sources have redshifts between z — 1.0 and 
z — 6.5. We consider 9 sources in this redshift range. The 
arcs produced by the combination lens-sources are shown in 
figure 1 (left panel). These arcs will constitute the strong 
lensing part of our data set. We use all the pixels containing 
part of one arc in the previous image. There are 621 of these 
pixels. All the sources have at least two lensed images in the 
previous plot. Some sources appear as many as five times. 
Although we search for multiple images only in the central 
part of the field of view (3x3 arcmin 2 ), we use the mass 
over the entire field of view (6x6 arcmin 2 ) to calculate the 
deflection angle. 
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5.3 Weak lensing data 

For the weak lensing part, we calculate the shear field 
over the entire 6x6 arcmin 2 field of view (or equivalently 
1.35 x 1.35 (h _1 Mpc) 2 ). The shear is simulated assuming 
the sources have a medium redshift of z = 3. We consider 
that the observation is deep enough and that a shear mea- 
surement can be obtained after averaging areas of 14.5 x 14.5 
arcsec 2 which renders 625 shear measurements over the field 
of view (625 71 and 625 72). The shear field is shown in the 
right panel of figure 1. 

Summarizing, the strong lensing data consist of Ng — 
621 pixels distributed in about 40 strongly lensed images (or 
arcs) coming from 9 sources. Each pixel contributes as two 
data points (9 X and 9 y ). The shear is computed on a JV 7 = 
25 x 25 grid over a field of view expanding 6 arcmin. Each 
shear measurement contributes also with two data points (7/ 
and 71 ) The data vector, $ is then an N-dimensional vector 
with N = 2N g + 2N~, = 2x612 + 2x625 = 2492. The number 
of unknowns N x is the number of cells (or basis) N c plus two 
times the number of sources N s (the factor 2 coming from 
the x and y component), N x = N c + 2N a = 2x 500 + 2x9 = 
1018 where we have assumed that the lens plane has been 
divided in 500 cells. The matrix A (= r*C _1 r) is a N x x N x 
matrix, and the vectors x and a = r'C -1 <? have dimension 
N x (A and a are defined below equation 24). 



6 RESULTS 

As in papers I and II, we start the minimization process as- 
suming we know nothing about the mass distribution and 
use a regular grid to divide the lens plane. Also, as explained 
in paper I, a regular grid has the inconvenience that the 
small details of the mass distribution can not be described 
with enough accuracy. That means, the lens is less adapt- 
able and will have problems reproducing the data. To avoid 
getting a very biased solution, the minimization process has 
to be stopped earlier than in the case where the grid repro- 
duces finer details (bigger e). Otherwise we will end up with 
an unphysical solution which tries to fit the data superpos- 
ing big "chunks" of dark matter in the lens plane. Figure 2 
shows the result after the first iteration. It also shows the 
grid used to decompose the plane of the surface mass into 
cells (N c — 256 cells). The first iteration finds an elliptical 
distribution of mass in the right location but is unable to 
unveil any of the finer details of the mass distribution. The 
total mass in this first iteration is smaller than the original 
mass by 20%. Once we have a guess for the mass distri- 
bution, the adaptive grid can be constructed by splitting 
the cells with higher densities into smaller cells. Cells are 
split in an iterative process which subdivides the cells hav- 
ing higher densities into four smaller sub-cells. The splitting 
procedure stops when the goal number of cells is achieved, 
say N c = 500. Each time a new grid is built, the T matrix 
has to be recomputed again. Each minimization step (new 
grid + new T + new solution) usually takes about 10 sec- 
onds on a 1 GHz processor. In figure 3 we show the result 
after 10 minimizations. The number of cells used in this case 
was N c — 500. Note how the recovered mass reproduces well 
most of the original structure up to the limits of the field of 
view (compare with figure 1). 




Figure 2. Recovered mass (6 arcmin) after first minimization 
(regular grid). The total mass is 20% smaller than the true one. 



L 




Figure 3. Recovered mass (6 arcmin) after 10 minimizations 
(multi-resolution grid). The total mass is 2% smaller than the 
true one. This result was obtained using the bi-conjugatc gradi- 
ent algorithm (1-2 seconds). The minimization is stopped when 
X 2 ~ 10~ n . Using quadratic programming, the result is very 
similar (see figure 4) but it takes several hours to converge. 

Minimizing \ 2 using the nonnegative quadratic pro- 
gramming algorithm described above renders very similar 
results but the process can take up to several hours to con- 
verge. The main advantage of using QADP is that the so- 
lution converges to a mass distribution which is less biased 
with respect to the true mass than the point source solu- 
tion given by the bi-conjugate gradient algorithm. In figure 
4 we show the results obtained with QADP in the differ- 
ent scenarios, using SL data alone, using WL data alone 
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Figure 4. Results obtained with the QADP algorithm and using 
a regular grid of 32 X 32 cells. The field of view is 6 arcmin in 
all cases. From top to bottom and left to right, original mass, 
reconstructed mass with SL data only, reconstructed mass with 
WL only, and reconstructed mass with combined WL and SL 
data. The numbers at the bottom of each panel is the total mass 
in the field of view. QADP was left running until convergence 
(relative change in the total mass of 10~ 9 ). 

and combining both. The combination gives a better re- 
constructed mass than the other two. Note also, how using 
WL data alone over-predicts the total mass by almost 30% 
while the combination overpredicts the mass by only 12%. 
QADP does not suffer of the regularization problems of bi- 
conjugate gradient. There is no point source solution and the 
algorithm can be left running until convergence is achieved. 
The results presented above, correspond to the solution at 
the convergence point (relative change in the total mass less 
than 1(T 9 ). 

By comparing these results with the ones in papers I 
and II we see an improvement in the recovered mass profile. 
First, adding weak lensing allows the reconstruction to ex- 
tend much further than the case where only strong lensing 
data is used. Second, in papers I and II we showed how the 
results may depend on the specific choice of e. In particular, 
we showed that setting a very small e produces a solution 
where the recovered sources are too small. This solution was 
called the point source solution in the previous papers I and 
II. Adding weak lensing partially solves this problem (see fig- 
ure 5). The dependency with e is much weaker when weak 
and strong lensing are combined together. When only strong 
lensing is used in the minimization (see papers I and II) , the 
bi-conjugate gradient naturally tends to increase the mass 
in the center of the lens so the sources get more compressed 
in the center of the image (smaller \ 2 )- Adding weak lensing 
avoids the mass to grow too much in the center since that 
would not reproduce properly the observed shear field. On 
the other hand, using weak lensing alone has the potential 
problem of the mass-sheet degeneracy. Adding strong lens- 
ing acts as a regularizing component since a very specific 
amount of mass is needed in the central region to focus the 




Figure 5. Point source solution in the 6 arcmin field of view 
obtained with the biconjugate gradient algorithm (absolute mini- 
mum of x2 £S 10 ). The total mass is only 10% larger than the 
true one. 

big arcs into compact sources at different redshifts. 

Another important difference with papers I and II is 
that they used no covariance matrix (or more specifically, 
they assumed that C = I). The main reason to introduce a 
covariance matrix in the present paper is to properly weight 
the strong and weak lensing data. The covariance matrix 
can be also viewed as a way to allow for the instrumental 
noise and systematic error to play a role in the strong and 
weak lensing data, making one data set more relevant than 
the other if their measurements are more accurate. 



6.1 Dependence on the covariance matrix, C 

The covariance matrix C controls which information is more 
relevant in the x 2 . The main advantage of a combined 
weak+strong lensing analysis is that we can get both the 
gradient of the mass distribution up to large radii from the 
weak lensing part and the overall mass normalization plus 
detailed internal distribution from the strong lensing part. 
The two regimes are properly weighted through the covari- 
ance matrix, C. Giving more importance to the strong lens- 
ing data will produce a better estimate in the central re- 
gions but will produce a result relatively insensitive to the 
outer regions. On the other hand, increasing the relevance 
of the weak lensing will constrain better the outer regions 
but at the expense of losing accuracy in the normalization. 
A good example of this is shown in figure 6. In this exam- 
ple we vary the amplitude of the covariance of the shear 
map by two orders of magnitude. Making the shear covari- 
ance smaller increases the relative importance of the weak 
lensing in the minimization. On the other hand, increasing 
the shear covariance, reduces the overall importance of the 
weak lensing part. We found that values of the strong lensing 
and shear covariance around ere = 10 -6 rads (~ 2 arcsec) 
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Figure 6. Total mass as a function of iteration number inside the 
bi-conjugate gradient routine. The true mass is shown as a flat 
dotted line. The three solid lines correspond to 3 assumptions 
on the covariance matrix C. Middle curve, erg = 3 pixels and 
cr-y = 0.005. Upper curve, <r 7 is 10 times smaller (WL dominates). 
Lower curve, <r 7 is 10 times larger (SL dominates). The upper x- 
axis shows the corresponding value of the \ 2 a t each iteration. 
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Figure 7. Histograms of the elements of T y (solid) and A2 
matrices. T y has been multiplied by a factor 500. This factor 
is roughly the ratio between erg (3 pixels or 2.1 arcsec = 10~ 5 
rad) and <r 7 (0.005) in the covariance matrix C. These values 
for erg and <r 7 make the C J Y and C — X A to have more or less 
the same variance. We refer to this case as the equal variance 
approach. The histogram of T y extends up to ±40. Large values 
in T occur when the shear is measured near the caustics. 



and <r 7 = 0.005 produce an unbiased estimate of the total 
mass (see figure 6). The ratio of these two values is equal 
to g j = 0.005/ue = 500. This ratio makes the contribution 
of the weak and strong lensing more or less equal in the \ 2 
(see figure 7) . We call this the equal variance approach since 
the dispersion of the C X Y and C -1 A is more or less the 
same (see figure 7). Also in figure 6 we show the evolution 
of x 2 with the iteration number. The \ 2 decreases quickly 
in the first iterations and reaches a plateau afterward. The 
bottom of the plateau is at \ ~ 10~ 13 . This solution cor- 
responds to the point source solution identified in papers I 
and II (see figure 5) . In opposition to what happened in the 
previous papers I and II, the point source solution does not 
deviate much from the real mass distribution. This is an im- 
portant improvement since it shows how the combination of 
weak and strong lensing stabilizes the solution and waives 
the need of any prior on the size of the sources (this prior 
was needed in papers I and II). 

The recovered 1-dimensional profiles show clearly the 
effect of changing the relative weights of the shear and strong 
lensing data. In figure 8 we show the same cases as in figure 
6. The upper thin solid line corresponds to the case where 
the weak lensing is given more relative weight while the lower 
thin solid line is the opposite case where the strong lensing 
is given more importance. The two middle profiles (dotted 
and dashed line) correspond to the intermediate case where 
the histograms of Y and A (re-scaled by C _1 ) share more 
or less the same scale (figure 7) . 

6.2 Alternative choices for C 

So far we have considered only the case where ag and cr 7 are 
constants. The equal variance approach consists on choosing 
ct 7 such that the dispersion of the C X Y and C _1 A ma- 
trices are more or less the same. In other words, this choice 
for C _1 makes the contribution from the weak and strong 
lensing more or less equal (when Ng w iV 7 ). This choice pro- 
duces satisfactory results as we have seen above. 



Since C can be seen as the matrix containing the covariances 
of the data points, one may feel tempted to play with differ- 
ent weights for the data set. For instance, one may consider 
giving more relative importance to the smaller radial arcs 
than to the bigger tangential arcs. This is motivated by the 
fact the the residual of the strong lensing part is more clearly 
dominated by the big tangential arcs than by the small ra- 
dial ones. We have tried different weighting factors in the 
matrix C and found that the best results are obtained when 
the weight of the strong lensing data, ag, is homogeneous 
over the field of view , that is, all data points are given the 
same importance independently or whether they are forming 
part of a giant arc or a tiny radial arc. Weighting the radial 
arcs more than the tangential ones produces biased results 
in the recovered mass distribution, included the position of 
the central peak. A good result is obtained also when the 
weight is proportional to the fraction of pixels in the system 
compared with the total number of pixels in all systems. In 
this case, the results are very similar to the ones obtained 
with an homogeneous weight in C. 

6.3 Dependence on the basis /; 

In this section, we will discuss the role of the basis functions 

/i used to decompose the mass (equation 4). 

We found that in general compact basis give better results 

than extended ones. As an example, in figure 9 we show 

the reconstructed profiles using three different sets of basis 

functions: 

i) A Gaussian basis centered in each cell with a width, a, 
equal to two times the size of the cell, 

G(r) <x cxp(-r 2 /2cr 2 ). (31) 

ii) An isothermal sphere with a core of the same scale a, 
J( r ) oc _J_. (32) 
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Figure 8. Original profile (thick solid line) vs reconstructed ones. 
Dotted line is the recovered profile using the fast bi-conjugatc 
gradient algorithm while dashed line is the result obtained with 
QADP. The top thin solid line corresponds to the upper curves 
of figure 6 (WL dominated case). The bottom thin solid line cor- 
respond to the lower curve of figure 6 (SL dominated case). Note 
how the strong lensing dominated analysis reproduces better the 
central peak but fails in the tails and how the situation reverses 
when we increase the relative importance of the weak lensing in 
the covariancc matrix C. 
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Figure 9. Original profile (thick solid line) vs reconstructed 
ones. From top to bottom. Using as basis /; isothermal spheres 
(dashed), power laws (r -2 ) (dotted) and Gaussians (thin solid 
line). The inner plot shows the three basis for the same scale 
(j = l. Gaussian (solid), power law (r 2 + o") -1 (dotted) and 
isothermal (r + cr) _1 (dashed). Basis with extended tails act 
adding a constant surface mass density to the overall mass. Com- 
pact functions like the Gaussian can concentrate the mass closer 
to the cell where they are positioned. 



iii) A power law also with a core of the same scale a, 



The results obtained with the isothermal sphere and the 
power law show a constant sheet excess in the surface mass 
density which is probably due to the extended tails of the 
basis. These two basis reproduce well the central parts but 
fail in predicting the right density in the outer regions. This 
behavior may be a manifestation of the mass-sheet degener- 
acy. 



7 CONCLUSIONS 

In this paper, we have presented a way of consistently 
combining strong and weak lensing using a non-parametric 
method (WSLAP) which does not rely on any prior on the 
luminosity and does not suffer from significant regulariza- 
tion problems. Finding the solution through the bi-conjugatc 
gradient still is affected by minor regularization problems as 
the minimization has to be stopped before the absolute min- 
imum is reached. This is needed to avoid the point source 
solution. However, we have seen how even the point source 
solution can be a good estimation of the mass when weak 
and strong lensing are combined. In previous papers using 
only strong lensing, we found that the point source solution 
obtained with the bi-conjugate gradient was a bad estimate 
of the mass. On the other hand, the solution obtained with 
QADP does not suffer of regularization problems. Imposing 
the constraint that the masses have to be positive is a nat- 
ural way to regularize the solution. 

Adding weak lensing has two major effects on the solu- 
tion; i) when minimizing the quadratic function with stan- 
dard algorithms (for instance the bi-conjugate gradient) the 



result is basically insensitive to the threshold e where the 
minimization is stopped since the negative masses which ap- 
pear when e is too small can not reproduce the shear field 
properly, ii) the profile can be better reproduced inside and 
beyond the position of the big arcs. The weak lensing data 
allow us to eliminate the use of any prior on the physical size 
of the sources and to better constrain the range of solutions, 
thus adding more robustness to the final result. 

The method allows the freedom to make two choices, 
for the covariance matrix C and the basis functions /; , and 
we quantified the impact of both on systematic errors in 
the mass reconstruction. We found that the equal variance 
approach for the covariance matrix renders satisfactory re- 
sults. Giving more relative importance to the radial than to 
the tangential arcs produces a biased solution for the mass. 
Weighting the arc systems proportional to their area in the 
sky produces similar results as in the case of the equal vari- 
ance approach. Regarding the basis functions, we found that 
functions /; which are compact produce better results than 
extended functions, specially in describing the weak lensing 
part of the data. This fact may be a manifestation of the 
mass-sheet degeneracy in the weak lensing data. 

This paper is more of an illustration of how to extend 
the methodology of SLAP (papers I and II) to include weak 
lensing than a detailed description of the capabilities and 
failures of the WSLAP approach. However, although an il- 
lustration, this paper demonstrates the usefulness of non- 
parametric methods when combining weak and strong lens- 
ing. 

Much work needs still to be done to address possible 
systematic issues, but as described in paper II, most of this 
work will have to be done when WSLAP is applied to real 
data. The systematics may depend on the specific nature of 
the problem (number of sources, geometry and redshift of 
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the lens, quality of the data). Future improvements will in- 
clude adding photometric information and a better modeling 
of the sources (Sandvik et al. in preparation). 

WSLAP is now available to the community at: 

http://darwin.cfa.harvard.edu/SLAP/. 
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